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Abstract 

In  this  paper,  we  present  the  design  of  an  adaptive  non¬ 
linear  algorithm,  for  estimation  of  Euclidean  position  of 
features  in  a  static  environment  in  the  field  of  view  of  a 
monocular  camera.  The  development  of  the  geometric 
model  and  camera  motion  kinematics  is  based  on  our 
previous  work  in  [2].  This  paper  presents  a  new  alter¬ 
nate  approach  to  estimation  of  3D  coordinates  of  feature 
points  that  is  simpler  in  mathematical  formulation  and 
easier  to  implement. 

1  Introduction 

The  estimation  of  3D  Euclidean  position  of  features  in  an 
environment  from  2D  images  have  applications  in  such  di¬ 
verse  areas  as  autonomous  vehicle  navigation,  visual  servo- 
ing,  3D  modeling,  and  geospatial  mapping.  In  the  computer 
vision  community,  this  is  typically  recognized  as  the  prob¬ 
lem  of  identification  of  Structure  from  Motion  (SFM),  or  Si¬ 
multaneous  Localization  and  Mapping  (SLAM,  [5]),  where 
a  vehicle  such  as  a  mobile  robot  is  required  to  incremen¬ 
tally  build  a  map  of  its  environment  as  well  as  determine 
its  position  as  it  travels  through  the  environment,  scanning 
the  scene  using  a  vision  system.  Although  the  transforma¬ 
tion  of  3D  Euclidean  information  by  a  perspective  camera 
into  2D  image  space  representation  is  inherently  nonlinear, 
the  most  well  known  approaches  are  based  on  linearization 
based  techniques  such  as  extended  Kalman  filtering  [5,  14]. 
Recently,  the  application  of  nonlinear  system  analysis  tools 
towards  a  solution  to  this  problem  have  begun  to  appear 
in  works  from  many  researchers  [1,  8,  9],  utilizing  sliding 
mode  and  adaptive  estimation  techniques  to  build  nonlinear 
observers. 

The  work  presented  in  this  paper  is  an  evolution  from  our 
previous  efforts  [2]  in  the  design  and  implementation  of  a 
nonlinear  algorithm  for  3D  Euclidean  position  estimation. 
Similar  to  our  previous  approach,  the  camera  motion  is  mod¬ 
eled  in  terms  of  the  homography  between  two  different  views 
of  the  environment  captured  by  the  camera.  One  of  the 
views  is  utilized  as  a  constant  image  (corresponding  to  a 
reference  position),  while  the  other  is  a  continuous  temporal 
update  of  the  2D  projection  of  the  environment  as  seen  by 
the  camera  while  in  motion.  It  is  assumed  that  the  cam- 

1  Tli is  work  was  supported  in  part  by  two  DOC  Grants,  an 
ARO  Automotive  Center  Grant,  a  DOE  Contract,  a  Honda  Cor¬ 
poration  Grant,  and  a  DARPA  Contract. 


era  is  calibrated,  and  hence,  the  intrinsic  calibration  para¬ 
meters  are  available.  The  kinematics  of  camera  motion  is 
formulated  based  on  2|D  visual  servoing  work  presented  in 
[13].  A  stack  of  continuous  motion  estimators,  described  in 
detail  in  [3],  is  then  utilized  to  estimate  this  kinematic  sig¬ 
nal  in  terms  of  every  feature  point.  The  result  is  a  stack  of 
camera  velocity  estimates  that  are  scaled  by  the  (constant) 
depth  of  the  individual  feature  points  from  the  camera  rela¬ 
tive  to  its  reference  location.  The  unknown  scale  factors  are 
subsequently  identified  by  using  the  adaptive  least-squares 
technique  [16],  based  on  the  satisfaction  of  a  persistent  exci¬ 
tation  condition.  With  the  unknown  depths  for  every  feature 
point  identified,  the  3D  Euclidean  position  of  all  features  are 
recovered.  This  is  unlike  our  previous  approach,  where  we 
developed  a  stack  of  Euclidean  position  estimators  based  on 
the  image-space  dynamics  for  every  feature  point.  Hence, 
the  mathematical  formulation  of  the  algorithm  is  simpler 
and  more  intuitive.  Experimental  results  obtained  from  the 
same  input  data  utilized  in  our  previous  work2  seem  to  sug¬ 
gest  that  the  new  estimation  algorithm  converges  faster,  in 
a  span  of  a  few  seconds  as  opposed  to  many  tens  of  seconds. 

This  paper  is  organized  as  follows.  A  geometric  model  de¬ 
scribing  the  evolution  of  image  coordinates  of  various  feature 
points  in  the  environment  in  terms  of  camera  motion  is  pre¬ 
sented  in  Section  2.  Section  3  presents  the  kinematics  of  the 
camera  in  terms  of  image  space  and  partially  reconstructed 
Euclidean  information.  The  velocity  estimation  algorithm 
is  described  in  Section  4,  and  is  key  to  the  development  of 
the  3D  Euclidean  position  estimation  algorithm  described  in 
Section  5.  Sections  6  and  7  demonstrate  the  performance  of 
the  position  estimation  algorithm  through  simulations  and 
experimental  results  from  our  test-bed,  respectively. 

2  Vision  System  Geometric  Model 

Figure  1  shows  the  geometric  relationship  between  a  moving 
perspective  camera  and  features  on  a  static  object  in  its 
field  of  view.  The  geometric  model  developed  in  this  section 
is  based  on  two  views  of  the  object  from  the  camera;  one 
of  which  is  from  a  reference  position  denoted  by  X* ,  and 
the  other  is  from  a  time  varying  current  position  denoted 
by  X.  The  origin  of  the  camera  frame  is  assumed  to  be 
coincident  with  the  optical  center  of  the  camera.  Let  Xf  (f)  £ 
R3  represent  the  position  of  the  object’s  body  fixed  frame 

2  A  version  of  the  work  in  [2]  containing  experimental  re¬ 
sults  is  currently  under  peer-review  for  consideration  as  a  journal 
publication. 


(camera  current  position) 


Figure  1:  Geometrical  relationships  between  the  moving 
camera  and  the  object. 


in  (6)  in  terms  of  image  coordinates  of  the  feature  points 
as  captured  by  the  moving  camera,  we  define  the  normal¬ 
ized  Euclidean  coordinates  of  the  feature  points,  denoted  by 
m*  £  R3  as 


A  m,-  *  A  Tn-  /r-?\ 

mi  =  -T7  =  (7) 

The  corresponding  projective  pixel  coordinates  of  the  feature 
points  are  denoted  by  Pi{t),p*  £  R3  as  follows 

Pi=  [  Ui  Vi  1  ] T  p*  =  [  u*  v*  1  ] T .  (8) 

The  image  coordinates  of  the  features  and  their  normalized 
Euclidean  coordinates  are  related  by  the  pin-hole  camera 
model  [7]  such  that 


Pi  =  Arm  p*  =  Am*  (9) 

where  A  £  R3x3  is  a  known,  constant,  and  invertible  in¬ 
trinsic  camera  calibration  matrix.  Utilizing  (7)  and  (9),  the 
relationship  in  (6)  can  be  expressed  in  terms  of  image  coor¬ 
dinates  as  follows 


T*  relative  to  the  camera  frame  X,  and  let  R(t)  £  SO( 3) 
represent  the  orientation  between  the  object  and  the  camera 
frame  such  that  R  :  T*  — >  X.  The  constant  vector  Si  £  R3 
and  the  vector  fhi(t)  £  R3  denote  the  3D  Euclidean  position 
of  the  ith  feature  point  Oi  relative  to  the  object  frame  T* 
and  camera  frame  X,  respectively,  and 

frn=  [  Xi  m  Zi  }T .  (1) 

The  constant  terms  x*f  £  R3,  R*  £  SO( 3)  and  fh*  £  R3  are 
similarly  defined  for  the  camera  at  the  reference  position 
denoted  by  X*.  From  the  geometry  between  the  coordinate 
frames  depicted  in  Figure  1,  it  can  be  seen  that 

fhi  =  Xf  +  Rsi  (2) 

fh*  =  x  *f  +  R*  Si.  (3) 

After  eliminating  the  term  s;  from  (2)  and  (3),  we  have 

fhi  =  Xf  +  Rfh*  (4) 

where  R(t)  £  50(3)  and  Xf(t)  £  R3  are  defined  as  follows 

R  =  R(R*)t  Xf  =  Xf  —  Rx*f.  (5) 

It  is  clear  from  (5)  and  Figure  1  that  R(t)  and  Xf(t)  de¬ 
scribe  the  rotation  and  translation,  respectively,  of  the  cam¬ 
era  frame  X  relative  to  the  reference  position  X* . 

Let  three  non-collinear  feature  points  on  the  object  define 
the  plane  it*  in  Figure  1.  If  the  constants  n*  £  R3  and 
d  £  R  denote  the  normal  vector  and  distance  to  this  plane, 
respectively,  from  the  camera  reference  position  X* ,  then  for 
all  i  feature  points  that  lie  on  the  plane,  (4)  can  be  expressed 
as  follows 

/  fs  .  Xf  «jA  » 

mi  =  R  -\ — T-n  mi 

± _ t _ i  (6) 

H 

where  H  (t)  £  R3x3  represents  a  Euclidean  homography 
[6].  To  facilitate  the  development  of  the  relationship  given 


Pi=  —  a(r  +  xh(n*)T')  A~3  p* 

■  1  (10) 

aa  G 

where  oti(t)  £  R  denotes  the  depth  ratio,  Xh(t)  =  fdr  £  R3 
denotes  the  scaled  translation  vector  and  G(t )  £  R3x3  is  a 
full  rank  homogeneous  collineation  matrix  [12].  Given  more 
than  four  pairs  of  correspondences  ( Pi(t),p *)  coplanar  with 
7r* ,  the  set  of  linear  equations  in  (10)  can  be  solved  for  a 
unique  G(t)  up  to  a  scale  factor  by  using  techniques  such  as 
least  squares  minimization  [12].  This  approach  is  appropri¬ 
ate  for  visual  servoing  in  structured  environments,  where  we 
have  the  option  of  choosing  feature  points  that  are  coplanar. 
In  a  structure-from-motion  algorithm,  we  are  interested  in 
mapping  the  shape  of  objects  in  the  environment  around  the 
camera,  and  feature  points  of  interest  tracked  by  the  vision 
system  will  not  lie  on  a  plane.  The  key  here  is  to  utilize 
the  Virtual  Parallax  algorithm,  proposed  in  [13]  and  also 
described  in  detail  in  our  previous  work  in  [2],  in  order  to 
compute  G(t)  and  ai(t).  Utilizing  this  method,  any  three 
feature  points  on  the  object  can  be  chosen  in  order  to  de¬ 
fine  a  ‘virtual’  plane  7r*.  The  epipolar  constraints  governing 
the  projection  of  the  rest  of  the  feature  points  on  to  this 
virtual  plane  are  then  utilized  to  develop  an  expression  that 
facilitates  the  computation  of  G(t).  The  Virtual  Parallax 
algorithm  requires  at  least  eight  feature  points  in  order  to 
compute  the  matrix  G(t),  which  can  be  subsequently  used 
to  uniquely  determine  H(t)  in  (6),  given  that  the  intrinsic 
camera  calibration  matrix  A  is  assumed  to  be  known  [12]. 
The  matrix  H(t)  can  be  decomposed  to  its  constituent  ro¬ 
tation  matrix  R(t)  and  scaled  translation  vector  Xh(t)  by 
utilizing  the  decomposition  algorithms  described  in  detail  in 
works  such  as  [7,  12]. 

Remark  1  The  reader  is  referred  to  [2]  for  computation  of 
the  homography  matrix  H(t)  and  the  depth  ratios  ai(t)  us¬ 
ing  the  Virtual  Parallax  algorithm  for  non-coplanar  feature 
points. 


3  Camera  Kinematics 


where 


The  translation  and  rotation  kinematics  of  the  camera  frame 
X  relative  to  the  fixed  position  X* ,  denoted  by  ev(t)  £  R3 
and  e^{t)  £  M3,  respectively,  are  developed  in  terms  of  the 
image  coordinates  of  one  of  the  feature  points  (i  =  1  cho¬ 
sen  here  for  the  sake  of  simplicity)  and  the  partially  recon¬ 
structed  Euclidean  information  from  the  decomposition  of 
the  homography  matrix  H(t)  as  follows 

ev  =  [  ui  -  u*  vi-v*  -In (an)  ]T  (11) 

ew  =  u<j>  (12) 

where  ew(t)  is  expressed  in  terms  of  the  axis-angle  repre¬ 
sentation  [17]  of  R(t)  defined  in  (5)  such  that  u(t)  £  R3 
represents  a  unit  rotation  axis,  and  <j>(t)  £  R  denotes  the 
rotation  angle  about  u(t),  assumed  to  be  confined  to  the 
region  —  7r  <  <  n.  After  taking  the  time  derivative  of 

(11)  and  (12),  the  camera  kinematics  can  be  written  in  the 
following  form 

e  —  Jv  (13) 

where  e(t)  =  [  e^(t)  e^(t)  ]T  £  R6  and  v(t)  = 

[  Vc  (t)  ]T  £  R6,  where  vc(t),uic(t)  £  R3  denote 

the  linear  and  angular  velocities  of  the  camera  relative  to 
the  reference  position  X*  but  expressed  in  the  local  frame  X. 
In  (13),  J(t)  £  R6x6  is  the  following  Jacobian- like  matrix 


fi-Ae l  Ael  [m1]> 
03x3  —  Lu 


where  [mi\x  £  R3x3  denotes  the  skew-symmetric  form  of 
rm(t),  03x3  £  R3x3  denotes  a  3  x  3  zero  matrix,  Aei(t)  £ 
R3x3is  a  function  of  the  camera  intrinsic  calibration  para¬ 
meters  and  image  coordinates  of  the  ith  feature  point  as 
shown  below 


0  0  Mi 

Aei  =  A  —  0  0  Mi 

0  0  0 

and  L„(f)  £  R3x3  is  defined  as  follows 


r  a  r  0  r  i  i  i  sine  (<j>)  ,  .2 

= ia  -  2  hx  +  1 - r^\  ^l6) 

V  Sinc2{2  J 


CtiAei  Aei  [mi]> 
03x3  —Lu 


Note  that  the  first  three  elements  of  Vi(t)  £  R6  in  (19)  de¬ 
note  the  linear  velocity  of  the  ith  feature  point  scaled  by  the 
constant  depth  z* .  The  scaled  motion  signal  e.i{t )  £  R6  is 
estimated  utilizing  the  following  continuous  estimator 

e.%  =  /  (Ki  +  /6x6)ei(r)dr  +  /  ptaga  (Si(r))  dr 

J t0  J t0 

+(A,:  + /6x6)ei(f)  (20) 

where  ii(t )  =  ^  e^(t)  e^(t)  ]  £  R6  denotes  the  esti¬ 

mate  of  the  signal  ei(t),  ei(f)  =  ei(t)  —  ei(t)  £  R6  is  the 
estimation  error  in  ei(t),  Ki,pi  £  R6x6  are  positive  defi¬ 
nite  constant  diagonal  gain  matrices,  and  sgn(ei)  denotes 
the  signum  function  applied  to  each  element  of  the  vector 
ei(f).  The  development  of  the  above  estimator  is  described 
in  detail  in  [3].  To  summarize,  it  was  shown  that  the  esti¬ 
mator  in  (20)  asymptotically  identifies  the  signal  ei(f)  (i.e., 
ii(t )  — >  e.i(t )  as  t  — >  00),  provided  that  the  jth  diagonal 
element  of  the  gain  matrix  p,  and  the  jth  element  of  the 
vectors  e)  ( t )  and  ei  ( t )  satisfies  the  following  condition  for 
all  i  feature  points 

[Pi\j  >  |  [*],-)  + |  [*],•!  Vi  =  1,2,  ...6.  (21) 

The  analysis  presented  in  [3]  demonstrated  that  the  estima¬ 
tor  in  (20)  satisfies  the  following  property 

ii{t),ki{t)  €  £ 00  n  £2.  (22) 

The  result  in  (22)  is  critical  for  the  subsequent  Euclidean 
position  estimator  design.  Since  Jfei(t)  is  full  rank,  and  com¬ 
posed  of  known  signals,  the  estimate  for  the  scaled  velocity 

signal,  denoted  by  Vi(t)  =  [  u (t)  j  £  R6  can  be 

obtained  from  (17)  and  (20)  as  follows 

Vi  =  Jki  (23) 

and  as  a  consequence  of  the  result  in  (22),  it  can  be  shown 
[2]  that  the  estimation  error  in  scaled  velocity,  denoted  by 
Vi(t)  =  Vi(t)  —  Vi{t)  £  R6  satisfies  the  following 


where  sinc(<p  (t))  = 


sin  <j>  ( t ) 


v(t) 
and  Vi(t) 


£00  n  c2 

Vi(t)  as  t  — >  00. 


4  Velocity  Estimation 

Unlike  our  previous  approach  [2],  where  we  utilized  the 
image-space  dynamics  for  every  feature  point  in  order  to 
develop  a  stack  of  estimators  for  their  Euclidean  positions, 
the  estimation  algorithm  presented  in  this  paper  is  based 
on  a  stack  of  velocity  estimators;  one  per  feature  point.  To 
facilitate  this  development,  the  kinematic  expression  in  (13) 
is  rewritten  for  every  feature  point  tracked  by  the  vision 
system  as  follows 

e-i  =  JkiVi  (17) 


5  Euclidean  Position  Estimation 

From  (17),  the  kinematic  signal  e.i{t)  for  the  ith  feature  point 
can  be  written  in  terms  of  the  scaled  velocity  of  any  other 
jth  feature  point,  j  ^  i.  Choosing  j  =  1  for  the  sake  of 
simplicity,  it  can  be  shown  that 

ei  —  Jfci diag  (Aj,  A,;,  Ai,  1, 1, 1)  J^\  ei  (26) 


Ai  =  % 


is  the  unknown  depth  ratio,  and  diag(-)  £  R6x6  denotes  a 
diagonal  matrix.  In  terms  of  the  estimates  efit),  the  expres¬ 
sion  in  (26)  can  be  rewritten  in  the  following  manner 

e-i  =  Jfc;diag  (A,:,  A;,  Ai,  1, 1, 1)  Jfi^e i  +  77^  (28) 

where  r/i(t)  £  R6  is  defined  as  follows 

—  tlfcjdiag  (A*,  A^,  A^,  1, 1, 1)  J £i  e*.  (29) 

After  substituting  for  Jki{t)  in  (28)  and  simplifying  the  re¬ 
sulting  expression,  the  translational  kinematics  for  the  ith 
feature  point  can  be  expressed  as 


After  taking  the  time  derivative  of  (38)  and  substituting  the 
adaptive  update  law  in  (35),  the  following  expression  can  be 
obtained 

V  =  Lihxi  ^ \ihii  + 

<  -/3i|Ai|2||/rii||2  +  |/3iAihfi?7t,i|.  (39) 

After  utilizing  the  nonlinear  damping  argument  [10],  the  sec¬ 
ond  term  in  (39)  can  be  upper-bounded  as  follows 

\pi*ihuVvi\  <  ||M|2  +@i  hvif  ,  (40) 


Vi  =  Xihu  +  h2i  +  rjvi  (30) 

where  yfit)  =  evi{t)  £  R3,  yvi(t)  £  R3  is  composed  of  the 
first  three  elements  of  yfit),  and  hu(t),  h2i(t)  £  R3  are  the 
following  measurable  signals 

hn  —  -  e„i  A  Aei  [nri]x  (31) 

h2i  —  Aej  [jRi]  x  ew.  (32) 

In  terms  of  a  subsequently  designed  estimate  A ,:(f)  £  R  for 
the  depth  ratio  A j,  the  estimate  for  yi(t),  denoted  by  yfit)  £ 
R3,  can  be  expressed  as  follows 

yi  =  Xihu  +  h2i.  (33) 

After  denoting  yfit)  =  yfit)  —  yfit)  £  R3  as  the  estimation 
error  signal,  it  can  be  seen  from  (30)  and  (33)  that 

Vi  =  Xihu  +  rjvi  (34) 

where  A i(t)  =  Xi(t)  —  Xi(t)  £  R.  Based  on  the  subsequent 
Lyapunov-based  stability  analysis,  the  depth  ratio  estimate 
A fit)  is  updated  per  the  following  adaptive  least-squares  up¬ 
date  law 

Ai  =  f3iLihJiyi  (35) 

where  (3t  >  1  £  R  is  a  positive  constant,  Lfit)  £  R  is  an 
estimation  gain  that  is  computed  as  follows 

^L.-1  =  hTuhu  (36) 


hence,  the  time  derivative  of  (38)  can  be  upper-bounded  in 
the  following  manner 

V  —  —y-i  |  A; |  \\hu\\2+02i\\yvi\\2  (41) 

where  =  (/3i  —  1)  £  R  is  a  scalar  positive  constant.  From 
(22),  (24)  and  (29),  we  can  show  that  rivi(t)  £  C2  n£oo,  and 
hence, 

[  Mi  I Aj  (r)|  pi;  (t)||“  dr  <  U(0)  -  U(oo)  +  £;  (42) 

J  to 

where  £;  £  R  is  a  positive  constant  such  that 

[  01  \\Vvi  (r)l|2  dr  <  £i.  (43) 

Jt  o 

From  (42),  it  can  be  concluded  that  Xfit)hu(t)  £  C2.  As 
evident  from  (38)  and  (42),  V(t)  <  V(0)  +£  for  any  time 
t,  and  hence,  V(t)  £  C^-,  therefore,  A fit)  £  C^.  Since  the 
term  h\fit)  defined  in  (31)  is  composed  of  bounded  signals, 
Xi(t)hu(t)  £  C oof\C2.  From  (34),  yfit)  £  Coo,  and  it  follows 
from  (35)  that  A;(t),A;(t)  £  Coo ■  Following  the  analysis 
presented  in  [3],  we  can  show  that  the  signal  yfit)  £  Coo , 
and  hence,  from  taking  the  time  derivative  of  (31),  it  can 
be  shown  that  hufit)  £  Coo ■  Hence,  we  have  established 
that  Xi(t)hu(t)  is  uniformly  continuous,  based  on  the  fact 
that  Xfit)h\fit)  and  ^  [xfit)hu(t)  j  are  bounded  signals  [4]. 
Utilizing  Barbalat’s  lemma  [4],  it  can  be  concluded  that 


and  initialized  such  that  Li  x(0)  >  0. 


Xfit)hifit)  — >  0  ast  — >■  00.  (44) 


5.1  Stability  Analysis 

Theorem  1  The  update  law  defined  in  (35)  ensures  that 
X fit)  — >  A,  as  t  — >  00  provided  that  the  following  persistent 
excitation  condition  is  satisfied 

rto+T  _  _ 

7i  <  /  hu(r)hu(r)dr  <  72  (37) 

Jt0 

where  io,F,  7i,72  £  R  are  positive  constants. 


If  the  signal  hifit)  satisfies  the  persistent  excitation  condi¬ 
tion  [16]  given  in  (37),  then  it  can  be  concluded  from  (44) 
that 

A  fit)  — >  0  as  t  — >  oo.  (45) 


□ 


Remark  2  From  ( 31 ),  it  can  be  observed  that  the  persistent 
excitation  condition  in  (37)  is  easily  satisfied  if  the  camera 
has  non-zero  translational  velocity. 


Proof:  Let  V(t)  £  R  denote  a  non-negative  scalar  function 
defined  as  follows 

V  ±  hiL^Xi.  (38) 


Remark  3  A  fit)  provides  an  estimate  for  Jt-  (see  (27)).  If 
z*  is  known  a  priori  from  secondary  measurements,  an  es¬ 
timate  for  the  3D  Euclidean  coordinates  of  the  ith  feature 


point  at  the  reference  position ,  denoted  by  m*(t)  £  R3,  can 
be  obtained  as  follows 

m*(t)  =  Jy-A~1p*.  (46) 

Ai  (t) 

If  the  Euclidean  distance  between  any  two  features  i  and  j 
(i  ^  j)  is  known  (i.e.,  II  fh)  —  then  from  (35),  (46) 

and  the  above  theorem,  it  can  be  seen  that 

lim  — - — A~1p)  —  - 7 — — A~1p*i  =  —  II  fh*  —  fh)  II  .  (47) 
*-»  a  i(t)  p  a  ,-(*) 

The  scale  factor  z)  can  be  recovered  from  the  above  expres¬ 
sion  and  utilized  in  (46)  in  order  to  determine  the  Euclidean 
coordinates. 


6  Simulation  Results 

For  validation  of  the  position  estimation  algorithm  presented 
in  the  previous  section,  we  simulated  a  static  object  with  12 
feature  points  in  the  field  of  view  of  a  moving  camera.  The 
trajectory  for  camera  translational  velocity  was  chosen  as 
follows 

Vc(t)  =  [cos(t),-sin(f),0.1sin(f)]T  (m/s).  (48) 

For  the  sake  of  simplicity,  the  camera  intrinsic  calibration 
matrix  in  (9)  was  assumed  to  be  the  identity  matrix.  The 
image  coordinates  of  the  feature  points  on  the  static  object 
were  updated  utilizing  the  kinematics  presented  in  Section 
3.  The  estimator  gains  were  chosen  through  trial  and  error 
for  all  i  feature  points  as  follows 

K,  =  diag(10, 10, 10, 10, 10, 10), 

Pi  =  diag(l,  1,1, 1,1,1), 

Pi  =  1.5.  (49) 

Figure  2  shows  the  convergence  of  the  inverse  depth  esti¬ 
mates  Ai(f)  for  all  feature  points  on  the  object.  The  estima¬ 
tion  errors  are  shown  in  Figure  3. 

7  Experimental  Results 

Figure  4  shows  a  single  frame  from  a  video  sequence  uti¬ 
lized  to  verify  the  practical  performance  of  the  estimator. 
The  video  was  captured  from  a  monocular  gray-scale  cam¬ 
era  (Sony  XC-ST50)  found  to  have  the  following  intrinsic 
calibration  parameters 

'  797.5  0  312.9  ' 

A  =  0  819.4  231.8  .  (50) 

_  0  0  1 

The  camera  was  mounted  on  the  end-effector  of  a  Puma 
560  industrial  manipulator  arm  and  programmed  to  move 
along  a  smooth  closed  trajectory.  The  camera  was  interfaced 
to  our  vision  system  through  an  Imagenation  PXC200-AF 
framegrabber,  and  triggered  to  capture  images  at  the  rate 
of  20  frames  per  second  through  an  external  time  source. 
We  utilized  an  implementation  of  the  Lucas-Kanade  feature 
tracking  algorithm  [11]  provided  in  the  OpenCV  computer 
vision  library  [15]  to  detect  and  track  feature  points  in  the 


Figure  2:  Convergence  of  scaled  inverse  depth  estimates 
(dimensionless)  for  all  feature  points  on  the  ob¬ 
ject  (simulated  camera  motion). 


time(s) 


Figure  3:  Error  in  depth  estimation  (simulated  camera 
motion) . 

video  stream.  The  implementation  of  feature  tracking  and 
depth  estimation  programs  were  in  C++.  The  estimator 
utilized  the  gains  shown  in  (49).  Table  1  shows  a  compar¬ 
ison  between  the  actual  and  the  estimated  lengths  in  the 
scene.  The  time  evolution  of  the  inverse  depth  estimates  are 
shown  in  Figure  5.  Note  that  the  depth  estimates  converge 
quickly,  in  a  matter  of  a  few  seconds  as  opposed  to  several 
tens  of  seconds  required  by  the  estimator  design  based  on 
pixel  dynamics  in  our  previous  work  [2]. 

8  Conclusions 

This  paper  presented  an  algorithm  to  identify  the  Euclid¬ 
ean  coordinates  of  features  points  in  a  static  environment 
using  a  calibrated  monocular  moving  camera.  The  work 
presented  here  utilizes  an  approach  to  geometric  modeling 
similar  to  our  previous  work  [2],  but  describes  the  develop¬ 
ment  of  an  alternate  nonlinear  estimation  algorithm  which 
the  authors  believe  to  be  simpler  in  mathematical  formula- 


Figure  4:  A  frame  from  the  doll-house  video  sequence  used 
in  the  experimental  validation.  The  white  dots 
are  the  tracked  feature  points. 


time(s) 


Figure  5:  The  time  evolution  of  inverse  depth  estimates 
from  the  experimental  test-bed. 


Object 

Act.  dim.  (cm.) 

Est.  dim.  (cm.) 

Length  I 

23.6 

23.4 

Length  II 

39.7 

40.4 

Length  III 

1.0 

1.0 

Length  IV 

13.0 

13.1 

Length  V 

10.0 

10.1 

Length  VI 

19.8 

19.7 

Length  VII 

30.3 

30.1 

Table  1 :  Estimated  dimensions  from  the  experimental  test¬ 
bed. 


tion,  and  more  intuitive.  The  performance  of  the  algorithm 
has  been  verified  through  simulations  and  an  experimental 
implementation. 
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